-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Cumulative reduction (max, min, sum, prod) forward with small last dim #3182
base: develop
Are you sure you want to change the base?
Conversation
…ction_improvedOverROCM
__device__ virtual inline bool isbetter(const T& /*a*/, const T& /*b*/) const { return false; } | ||
__device__ virtual inline void combine(T& a, T b) const { a = b; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use CRTP instead of virtual functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved. Please take a look.
if(contiguous != 0 && contiguous != 1) | ||
std::cerr << "Error Tensor Contiguous should be 0 or 1" << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be bool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
auto inner_size = miopen::deref(inputDesc).GetLengths()[true_dim]; | ||
auto outer_size = size / inner_size; | ||
|
||
auto op_worker = reduce_func<OP, float, int>{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bad idea using the same function to verify itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand your comment clearly. reduce_func
template struct has been implemented independent with that is used in MIOpenCumulativeReduction
. In fact, their code are different.
…ction_improvedOverROCM
…OverROCM' into impl_cumulative_reduction_improvedOverROCM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is blocked by the CI issue we recently have. After the CI issue is resolved, this PR need to restart to run through the CI.
…OverROCM' into impl_cumulative_reduction_improvedOverROCM
…OverROCM' into impl_cumulative_reduction_improvedOverROCM
dim
with size smaller or equal to256
andstride
value at thatdim
of both input, output and indices tensor must equal to 1. For that reason,IsApplicable
constraint makes sure that the operation only works with the above case.float16
float32
bfloat16
Average over all cases: